<!doctype html>
<html>
  <head>
    <!-- MathJax -->
    <script type="text/javascript"
      src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
    </script>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="chrome=1">
    <title>
      Caffe | Loss
    </title>

    <link rel="icon" type="image/png" href="/images/caffeine-icon.png">

    <link rel="stylesheet" href="/stylesheets/reset.css">
    <link rel="stylesheet" href="/stylesheets/styles.css">
    <link rel="stylesheet" href="/stylesheets/pygment_trac.css">

    <meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no">
    <!--[if lt IE 9]>
    <script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
    <![endif]-->
  </head>
  <body>
  <script>
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

    ga('create', 'UA-46255508-1', 'daggerfs.com');
    ga('send', 'pageview');
  </script>
    <div class="wrapper">
      <header>
        <h1 class="header"><a href="/">Caffe</a></h1>
        <p class="header">
          Deep learning framework by <a class="header name" href="http://bair.berkeley.edu/">BAIR</a>
        </p>
        <p class="header">
          Created by
          <br>
          <a class="header name" href="http://daggerfs.com/">Yangqing Jia</a>
          <br>
          Lead Developer
          <br>
          <a class="header name" href="http://imaginarynumber.net/">Evan Shelhamer</a>
        <ul>
          <li>
            <a class="buttons github" href="https://github.com/BVLC/caffe">View On GitHub</a>
          </li>
        </ul>
      </header>
      <section>

      <h1 id="loss">Loss</h1>

<p>In Caffe, as in most of machine learning, learning is driven by a <strong>loss</strong> function (also known as an <strong>error</strong>, <strong>cost</strong>, or <strong>objective</strong> function).
A loss function specifies the goal of learning by mapping parameter settings (i.e., the current network weights) to a scalar value specifying the  “badness” of these parameter settings.
Hence, the goal of learning is to find a setting of the weights that <em>minimizes</em> the loss function.</p>

<p>The loss in Caffe is computed by the Forward pass of the network.
Each layer takes a set of input (<code class="highlighter-rouge">bottom</code>) blobs and produces a set of output (<code class="highlighter-rouge">top</code>) blobs.
Some of these layers’ outputs may be used in the loss function.
A typical choice of loss function for one-versus-all classification tasks is the <code class="highlighter-rouge">SoftmaxWithLoss</code> function, used in a network definition as follows, for example:</p>

<div class="highlighter-rouge"><pre class="highlight"><code>layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "pred"
  bottom: "label"
  top: "loss"
}
</code></pre>
</div>

<p>In a <code class="highlighter-rouge">SoftmaxWithLoss</code> function, the <code class="highlighter-rouge">top</code> blob is a scalar (empty shape) which averages the loss (computed from predicted labels <code class="highlighter-rouge">pred</code> and actuals labels <code class="highlighter-rouge">label</code>) over the entire mini-batch.</p>

<h3 id="loss-weights">Loss weights</h3>

<p>For nets with multiple layers producing a loss (e.g., a network that both classifies the input using a <code class="highlighter-rouge">SoftmaxWithLoss</code> layer and reconstructs it using a <code class="highlighter-rouge">EuclideanLoss</code> layer), <em>loss weights</em> can be used to specify their relative importance.</p>

<p>By convention, Caffe layer types with the suffix <code class="highlighter-rouge">Loss</code> contribute to the loss function, but other layers are assumed to be purely used for intermediate computations.
However, any layer can be used as a loss by adding a field <code class="highlighter-rouge">loss_weight: &lt;float&gt;</code> to a layer definition for each <code class="highlighter-rouge">top</code> blob produced by the layer.
Layers with the suffix <code class="highlighter-rouge">Loss</code> have an implicit <code class="highlighter-rouge">loss_weight: 1</code> for the first <code class="highlighter-rouge">top</code> blob (and <code class="highlighter-rouge">loss_weight: 0</code> for any additional <code class="highlighter-rouge">top</code>s); other layers have an implicit <code class="highlighter-rouge">loss_weight: 0</code> for all <code class="highlighter-rouge">top</code>s.
So, the above <code class="highlighter-rouge">SoftmaxWithLoss</code> layer could be equivalently written as:</p>

<div class="highlighter-rouge"><pre class="highlight"><code>layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "pred"
  bottom: "label"
  top: "loss"
  loss_weight: 1
}
</code></pre>
</div>

<p>However, <em>any</em> layer able to backpropagate may be given a non-zero <code class="highlighter-rouge">loss_weight</code>, allowing one to, for example, regularize the activations produced by some intermediate layer(s) of the network if desired.
For non-singleton outputs with an associated non-zero loss, the loss is computed simply by summing over all entries of the blob.</p>

<p>The final loss in Caffe, then, is computed by summing the total weighted loss over the network, as in the following pseudo-code:</p>

<div class="highlighter-rouge"><pre class="highlight"><code>loss := 0
for layer in layers:
  for top, loss_weight in layer.tops, layer.loss_weights:
    loss += loss_weight * sum(top)
</code></pre>
</div>


      </section>
  </div>
  </body>
</html>
